Time Period: Week 4 (Jan 30 - Feb 2)
This notebook is based on the csv extracted from Quercus from individual "Reflections" pages, by clicking on the "Student Analysis" tab. All student personal data has been removed, and the notebook only deals with aggregated groups (i.e. lecture assigned, engineering discipline).
General sources (kept between weeks in class_data folder):
ClassList-20231-APS106H1-S.csv: sheet from departmentvader_lexicon.txt: used for sentiment analysisRaw data week3_raw.csv is coming from Student Analytics in Quercus.
This notebook is broken into the following sub-sections:
# define imports
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import plotly
from wordcloud import WordCloud
from wordcloud import ImageColorGenerator
from wordcloud import STOPWORDS
import warnings
warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')
# read student data
student_data_path = "../class_data/ClassList-20231-APS106H1-S.csv" # changed file to csv, was getting erros on excel
students_df = pd.read_csv(student_data_path)
# read in reflection data
reflection_path = "week4_raw.csv"
reflections_df = pd.read_csv(reflection_path)
reflections_df.columns
Index(['name', 'id', 'sis_id', 'section', 'section_id', 'section_sis_id',
'submitted', 'attempt',
'3470025: What lecture section(s) did you attend, or do you plan to attend, inWeek 4? If multiple sections, please check them all.',
'1.0',
'3470027: How many lectures did you attend, or do you plan to attend, inWeek 4?',
'1.0.1',
'3470028: Which tutorial session did you attend, or do you plan to attend, inWeek 4?',
'1.0.2',
'3470031: Did you spend or plan to spend any time working onPractice ProblemsinWeek 4?',
'1.0.3',
'3470038: InWeek 4 during the 3rd lecture of the week, there was a midterm review. Did this review session help improve your understanding of the course material and prepare you for the midterm?',
'1.0.4',
'3470039: Did you attend, or do you plan to attend a lab/practical session, inWeek 4?',
'1.0.5',
'3470040: At the end ofWeek 4, do you feel on top of the course material or are you feeling that you're falling behind?',
'1.0.6',
'3470041: Please write one word that describes how you feel afterWeek 4of APS106.',
'1.0.7',
'3470043: Please list any and all topics that you are confused about or struggling with. List topics of confusion separated by commas. If you are not confused by anything, please leave it blank.\nExample: functions, variables, syntax errors, print, operators',
'1.0.8',
'3470046: Your first term test is next week. How prepared are you at this point?',
'1.0.9',
'3470052: Let's say you're asked to write a function to calculate and return the area of a circle. The function is called circle_areaand is used as follows:\n area = circle_area(1)\n print(area)\n3.1459\nWould you be confident choosing which of the following three functions is correct according to the use case above?\nimport math\n\ndef circle_area(radius):\n print(math.pi * radius**2)\n\ndef circle_area(radius):\n return math.pi * radius**2\n\ndef circle_area(radius):\n return print(math.pi * radius**2)',
'1.0.10',
'3470053: This week in lectures we learned about While loops. Please indicate your level of understanding regarding the concept of While loops.',
'1.0.11', 'n correct', 'n incorrect', 'score'],
dtype='object')
# 1 -data cleansing. Remove the "attempt" and question score columns auto-generated by quercus
reflections_df = reflections_df.drop(columns=['attempt','1.0', '1.0.1','1.0.2','1.0.3','1.0.4',
'1.0.4','1.0.5','1.0.6','1.0.7','1.0.8','1.0.9','1.0.10','1.0.11', 'n correct', 'n incorrect', 'score'])
# 2 - rename columns in dataframe so are shorter
current_column_names = list(reflections_df.columns)
column_names = {
current_column_names[7]: "lecture_sections",
current_column_names[8]: "lecture_count",
current_column_names[9]: "tutorial_sections",
current_column_names[10]: "practice_problems",
current_column_names[11]: "midterm_review",
current_column_names[12]: "practical_check",
current_column_names[13]: "progress_check",
current_column_names[14]: "comment_one_word",
current_column_names[15]: "confused_topics",
current_column_names[16]: "term_test_check",
current_column_names[17]: "function_return_check",
current_column_names[18]: "while_loop_check"
}
reflections_df.rename(columns=column_names, inplace=True)
# 3 - join reflection data with student data based on UTORid
combined_df = reflections_df.merge(students_df, left_on = "sis_id", right_on = "UTORid", how="left")
# 4 - rename disciplines from the POSt Code
combined_df['POSt Code'] = combined_df['POSt Code'].str.rstrip("X")
disciplines = {
"AECHEBASC": "Chemical",
"AECIVBASC": "Civil",
"AEMECBASC": "Mechanical",
"AELMEBASC": "Mineral",
"AEINDBASC": "Industrial",
"AEMMSBASC": "Materials",
"AEENGBASC": "Track-One",
"AE NDEGI": "Non-Degree"
}
combined_df['discipline'] = combined_df['POSt Code'].map(disciplines)
combined_df = combined_df.drop(columns=['sis_id','UTORid','name', 'Surname', 'Given Name', #'Person ID',
'Title', 'Year', 'Reg. Sts', 'Enr. Sts','Email Address','id' ])
# 5 - get count of lecture sections attended + reformat to match student information sheet
combined_df['LEC0101_attended'] = np.where(combined_df['lecture_sections'].str.contains("LEC01"), "LEC0101", False)
combined_df['LEC0102_attended'] = np.where(combined_df['lecture_sections'].str.contains("LEC02"), "LEC0102", False)
combined_df['LEC0103_attended'] = np.where(combined_df['lecture_sections'].str.contains("LEC03"), "LEC0103", False)
combined_df['lecture_assigned'] = np.where(np.logical_or(combined_df['LEC0103_attended'] == combined_df['Lecture'], np.logical_or(combined_df['LEC0101_attended'] == combined_df['Lecture'],
combined_df['LEC0102_attended'] == combined_df['Lecture'])), True, False) # does not account for students not in excel!
### ADD POLARITY TO LONG ANSWERS ###
# 1 - remove all punctuation from student paragraph, lower all strings
punctuations = '!"#$%&\'()*+-./:;<=>?@@[\\]^_`{|}~“'
esc_char = "\xa0"
def remove_punctuations(text):
text = str(text)
for x in punctuations:
if (x in text):
text=text.replace(x," ")
return text
combined_df['cleaned_answer'] = combined_df.apply(lambda row: remove_punctuations(text=row['comment_one_word']),axis=1)
combined_df['cleaned_answer'] = combined_df['cleaned_answer'].str.lower()
# 2 - each word from the reflection will be separated into its own row in a new dataframe
word_lookup = (combined_df['cleaned_answer'].str.split(' ', expand=True).stack().reset_index(name='word'))
word_lookup = word_lookup[word_lookup.word != '']
word_lookup = word_lookup.set_index('level_0',drop=True).rename(columns={'level_1':'num'})
# 3 - introduce lexicon to look up polarity values for each word
file = open('../class_data/vader_lexicon.txt', 'r').read().split('\n')
lexicon_text = [x for x in file[0:]]
lexicon = pd.DataFrame(lexicon_text,columns=['data'])
lexicon = lexicon['data'].str.split('\t',expand=True)
lexicon = lexicon[[0,1]].rename(columns={0: "word", 1: "polarity"}).set_index('word')
# 4 - look up each word inside the long answer, assign a polarity
word_lookup = word_lookup.join(lexicon, on ='word')
word_lookup.fillna(0, inplace=True)
# 5 - aggregate dataframe back to sum polarity per long answer
word_lookup[["polarity"]] = word_lookup[["polarity"]].apply(pd.to_numeric)
polarity = word_lookup.groupby(['level_0'])['polarity'].sum().to_frame()
# 6 - join polarity data back into combined_df
combined_df = combined_df.join(polarity)
# View DataFrame
combined_df.head(3)
| section | section_id | section_sis_id | submitted | lecture_sections | lecture_count | tutorial_sections | practice_problems | midterm_review | practical_check | ... | Lecture | Tutorial | Practical | discipline | LEC0101_attended | LEC0102_attended | LEC0103_attended | lecture_assigned | cleaned_answer | polarity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | APS106H1-S-LEC0102-20231, APS106H1-S-PRA0106-2... | 272282, 272296, 272315 | APS106H1-S-LEC0102-20231, APS106H1-S-PRA0106-2... | 2023-02-06 15:33:42 UTC | LEC02 - Beck | 3.0 | TUT0107 (Tuesday, 11:00 - 12:00) | No | Very Helpful | Yes | ... | LEC0102 | TUT0107 | PRA0106 | Track-One | False | LEC0102 | False | True | nervous for the midterm | -1.1 |
| 1 | APS106H1-S-TUT0106-20231, APS106H1-S-PRA0106-2... | 272313, 272296, 272282 | APS106H1-S-TUT0106-20231, APS106H1-S-PRA0106-2... | 2023-02-06 15:32:37 UTC | LEC02 - Beck | 2.0 | TUT0106 (Monday, 13:00 - 14:00) | Yes | Didn't attend the lecture or watch the recording | Yes | ... | LEC0102 | TUT0106 | PRA0106 | Mechanical | False | LEC0102 | False | True | good | 1.9 |
| 2 | APS106H1-S-LEC0103-20231, APS106H1-S-TUT0108-2... | 272284, 272317, 272300 | APS106H1-S-LEC0103-20231, APS106H1-S-TUT0108-2... | 2023-02-06 15:20:10 UTC | LEC03 - Rosu | 3.0 | TUT0108 (Monday, 15:00 - 16:00) | Yes | Very Helpful | No | ... | LEC0103 | TUT0108 | PRA0108 | Materials | False | False | LEC0103 | True | nan | 0.0 |
3 rows × 28 columns
# export cleaned dataframe into csv
combined_df.to_csv('reflection4.csv')
# number of students who completed reflection
completed = len(reflections_df)
not_complete = len(students_df) - completed
# how many students in each tutorial section
plt.figure(figsize=(7,7))
plt.title("Student % Completing Reflection")
values = [completed, not_complete]
name = ["Completed", "Not Completed"]
colors = ['#ADD8E6','#ffcc99','#98AFC7','#ff9999','#d7edb9','#c4bcc0','#67805c','#ab446c','#41734f','#eb4034']
explode = (0.1, 0)
plt.pie(values, labels=name, colors=colors, startangle=-180,autopct='%1.1f%%',explode=explode)
plt.show()
# plt.rcdefaults()
fig, ax = plt.subplots()
# Example data
labels = ('Completed', 'Not Completed')
y_pos = np.arange(len(labels))
values = [completed, not_complete]
ax.barh(y_pos, values, align='center')
ax.set_yticks(y_pos, labels=labels)
ax.invert_yaxis() # labels read top-to-bottom
ax.set_xlabel('Performance')
ax.set_title('Number of Students Completing Reflections')
plt.show()
week = 4
fig, axs = plt.subplots(1, 3, figsize=(14, 7))
fig.suptitle("Week {}: Frequency of Student Attendance Per Lecture Section".format(week))
column_list = ["LEC0101_attended","LEC0102_attended","LEC0103_attended"]
lecture_list = [ x.split('_')[0] for x in column_list]
lecture_name = {
"LEC0101": "Kinsella & Goodfellow",
"LEC0102": "Beck",
"LEC0103": "Rosu"
}
for index, value in enumerate(column_list):
data = combined_df[combined_df[value]==lecture_list[index]]
sns.countplot(x=value, hue="lecture_count", data=data, ax=axs[index])
axs[index].get_xaxis().set_visible(False)
axs[index].set_ylim(0,210)
axs[index].set_title(lecture_name[lecture_list[index]])
values=data['lecture_count'].value_counts().values
values = values[::-1]
for i, p in enumerate(axs[index].patches):
height = p.get_height()
axs[index].text(p.get_x()+p.get_width()/2., height + 0.1,values[i],ha="center")
plt.show()
fig, axs = plt.subplots(1, 3, figsize=(14, 7))
fig.suptitle("Week {}: % Frequency of Registered Student Attendance Per Lecture Section ".format(week))
column_list = ["LEC0101_attended","LEC0102_attended","LEC0103_attended"]
lecture_list = [ x.split('_')[0] for x in column_list]
lecture_name = {
"LEC0101": "Kinsella & Goodfellow",
"LEC0102": "Beck",
"LEC0103": "Rosu"
}
assigned_lecture = [
len(combined_df[combined_df["Lecture"]=="LEC0101"]),
len(combined_df[combined_df["Lecture"]=="LEC0102"]),
len(combined_df[combined_df["Lecture"]=="LEC0103"])
]
for index, value in enumerate(column_list):
#filter down the dataframe only to the specific lecturer
df_temp = combined_df[combined_df[value]==lecture_list[index]]
df_temp = df_temp.groupby("lecture_count").agg('count')
df_temp = df_temp[["section"]].copy().reset_index()
df_temp['percent'] = df_temp['section'] / assigned_lecture[index] * 100
sns.barplot(data=df_temp, x="lecture_count", y="percent", ax=axs[index])
axs[index].set_ylim(0,100)
axs[index].set_title(lecture_name[lecture_list[index]])
values = list(df_temp['percent'])
for i, p in enumerate(axs[index].patches):
height = p.get_height()
axs[index].text(p.get_x()+p.get_width()/2., height + 0.1,"{}%".format(round((values[i]),2)),ha="center")
plt.show()
plt.figure(figsize=(15,10))
plt.title('Lecture Attendance (Self-Reported) vs. Student Disciplines', fontsize = 20)
# combined_df["lecture_count"] = combined_df["lecture_count"].astype(str)
# combined_df["lecture_count"] = combined_df["lecture_count"].str.split('.').str[0]
#df_grouped['count'].str.split(".").str[0]
ax = sns.countplot(x="lecture_count", hue="discipline", data=combined_df[combined_df['lecture_count']!=0])
# for container in ax.containers:
# ax.bar_label(container)
for i, p in enumerate(ax.patches):
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 0.1,"{}".format(str(p).split(',')[3].split('=')[1]),ha="center")
ax.set_ylabel('Student Count', fontsize=14)
ax.set_xlabel('# Lectures Attended Week {}'.format(week), fontsize=14)
plt.legend(loc='upper left')
plt.show()
posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values
One note: these percentages are based on the students who answered the survey!
#define the total counts per discipline
total_students = dict(combined_df['discipline'].value_counts())
#define separate dataframe based on student discipline + attendance counts
agg_students = combined_df.groupby(['discipline','lecture_count']).count()
agg_students = agg_students.reset_index()
agg_students = agg_students[["discipline","lecture_count","section"]].copy()
#agg_students = agg_students[agg_students['lecture_count']!=0]
agg_students['total_count'] = agg_students['discipline'].map(total_students)
agg_students['percent'] = agg_students['section'] / agg_students['total_count'] * 100
plt.figure(figsize=(15,10))
plt.title('Lecture Attendance vs. Student Disciplines Normalized', fontsize = 20)
ax = sns.barplot(x="lecture_count", y="percent", hue="discipline", data=agg_students)
# add labels
for i, p in enumerate(ax.patches):
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 0.1,"{}%".format(round(height,0)),ha="center")
ax.set_ylabel('% Student Count', fontsize=14)
ax.set_xlabel('# Lectures Attended Week {}'.format(week), fontsize=14)
plt.legend(loc='upper left')
plt.show()
posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values
fig, axs = plt.subplots(1, 3, figsize=(14, 7))
fig.suptitle("Week {}: Student Attendance Per Lecture Section Per Engineering Discipline".format(week))
column_list = ["LEC0101_attended","LEC0102_attended","LEC0103_attended"]
lecture_list = [ x.split('_')[0] for x in column_list]
lecture_name = {
"LEC0101": "Kinsella & Goodfellow",
"LEC0102": "Beck",
"LEC0103": "Rosu"
}
# convert discipline columns to numeric (so consistent between lecture sections)
disciplines_num = {
"Chemical":1,
"Civil":2,
"Mechanical":3,
"Mineral":4,
"Industrial":5,
"Materials":6,
"Track-One":7,
"Non-Degree":8
}
combined_df['discipline_num'] = combined_df['discipline'].map(disciplines_num)
for index, value in enumerate(column_list):
sns.countplot(x=value, hue="discipline_num", data=combined_df[combined_df[value]==lecture_list[index]], ax=axs[index])
axs[index].get_xaxis().set_visible(False)
axs[index].set_ylim(0,200)
axs[index].set_title(lecture_name[lecture_list[index]])
axs[index].legend(labels = list(disciplines_num.keys()))
for i, p in enumerate(axs[index].patches):
height = p.get_height()
axs[index].text(p.get_x()+p.get_width()/2., height + 0.1,"{}".format(str(p).split(',')[3].split('=')[1][0:4]),ha="center")
if index == 0:
axs[index].set_ylabel('Student Count', fontsize=14)
else:
axs[index].set_ylabel(' ')
#axs[index].set_xlabel('Disciplines', fontsize=14)
plt.show()
fig, axs = plt.subplots(1, 3, figsize=(19, 7))
fig.suptitle("Week {}: Breakdown of Student Attendance by Discipline".format(week), size =16)
column_list = ["LEC0101_attended","LEC0102_attended","LEC0103_attended"]
lecture_list = [ x.split('_')[0] for x in column_list]
lecture_name = {
"LEC0101": "Kinsella & Goodfellow",
"LEC0102": "Beck",
"LEC0103": "Rosu"
}
assigned_lecture = [
len(combined_df[combined_df["Lecture"]=="LEC0101"]),
len(combined_df[combined_df["Lecture"]=="LEC0102"]),
len(combined_df[combined_df["Lecture"]=="LEC0103"])
]
for index, value in enumerate(column_list):
#filter down the dataframe only to the specific lecturer
df_temp = combined_df[combined_df[value]==lecture_list[index]]
df_temp = df_temp.groupby("discipline").agg('count')
df_temp = df_temp[["section"]].copy().reset_index()
df_temp['percent'] = df_temp['section'] / assigned_lecture[index] * 100
sns.barplot(data=df_temp, x="discipline", y="percent", ax=axs[index])
axs[index].set_ylim(0,100)
axs[index].set_title(lecture_name[lecture_list[index]])
values = list(df_temp['percent'])
for i, p in enumerate(axs[index].patches):
height = p.get_height()
axs[index].text(p.get_x()+p.get_width()/2., height + 0.1,"{}%".format(round((values[i]),2)),ha="center")
axs[index].set_xlabel(' ')
if index == 0:
axs[index].set_ylabel('Student %', fontsize=14)
else:
axs[index].set_ylabel(' ')
fig.tight_layout()
plt.show()
# how many students attended more than one instructor session last week (i.e. both Kinsella and Rosu lectures)
plt.figure(figsize=(7,7))
plt.title("Self-Reported, % Students Attending More Than One Instructor")
combined_df['multiple_sections'] = combined_df['lecture_sections'].str.contains(",")
name = ['Single Section']
values = combined_df['multiple_sections'].value_counts()
# print(values)
values = values[::-1]
colors = ['#ADD8E6','#ffcc99','#98AFC7','#ff9999','#ffcc99']
plt.pie(values, labels=name, colors=colors, startangle=-180,autopct='%1.1f%%',shadow=True)
plt.show()
# how many students are not attending their assigned lecture section?
plt.figure(figsize=(12,8))
plt.title('Self-Reported Number of Students Attending Assigned Lecture', fontsize = 18)
ax = sns.countplot(x="lecture_count", hue="lecture_assigned", data=combined_df[combined_df['lecture_count']>0])
for i, p in enumerate(ax.patches):
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 0.1, height ,ha="center")
ax.set_xlabel('# Lectures Attended', fontsize=14)
ax.set_ylabel('# Students', fontsize=14)
plt.show()
# how many students are not attending their assigned lecture section?
plt.figure(figsize=(12,8))
lecture_name = {
"LEC0101": "Kinsella & Goodfellow",
"LEC0102": "Beck",
"LEC0103": "Rosu"
}
plt.title('Self-Reported Number of Students Attending Each Lecture', fontsize = 18)
combined_df['lecture_name_assigned'] = combined_df['Lecture'].map(lecture_name)
ax = sns.countplot(x="lecture_name_assigned", hue="lecture_assigned", data=combined_df)
for i, p in enumerate(ax.patches):
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 0.1, height ,ha="center")
ax.set_xlabel('Total Students Present per Lecture Section', fontsize=14)
ax.set_ylabel('# Students', fontsize=14)
plt.show()
To interpret plot above, the number of students present can be found by summing assigned + not assigned students. For example, Beck's lecture had 113 + 9 = 122 unique students attending last week.
# how many students in each tutorial section
plt.figure(figsize=(7,7))
plt.title("Student Attending (%) per Tutorial Week {}".format(week))
name = list(dict(combined_df.tutorial_sections.value_counts()).keys())
values = list(dict(combined_df.tutorial_sections.value_counts()).values())
colors = ['#ADD8E6','#ffcc99','#98AFC7','#ff9999','#d7edb9','#c4bcc0','#67805c','#ab446c','#41734f','#eb4034']
explode = (0.2, 0, 0,0,0,0,0,0,0,0)
plt.pie(values, labels=name, colors=colors, startangle=-180,autopct='%1.1f%%',explode=explode)
plt.show()
# how many students did practice problems
plt.figure(figsize=(7,7))
plt.title("% Students Doing Practice Problems Week {}".format(week))
name = list(dict(combined_df.practice_problems.value_counts()).keys())
values = list(dict(combined_df.practice_problems.value_counts()).values())
colors = ['#98AFC7','#ff9999','#d7edb9','#c4bcc0','#67805c','#ab446c','#41734f']
explode = (0, 0.1)
plt.pie(values, labels=name, colors=colors, startangle=-90,autopct='%1.1f%%',explode=explode, shadow=True)
plt.show()
# student comfort with design problems
plt.figure(figsize=(7,7))
plt.title("Student View on Midtrm Review Week {}".format(week))
name = list(dict(combined_df.midterm_review.value_counts()).keys())
values = list(dict(combined_df.midterm_review.value_counts()).values())
colors = ['#d7edb9','#98AFC7','#c4bcc0','#ff9999','#c4bcc0','#67805c','#ab446c','#41734f']
explode = (0.1, 0,0,0)
plt.pie(values, labels=name, colors=colors, startangle=-90,autopct='%1.1f%%',explode=explode, shadow=True)
plt.show()
# how are students doing in week 2?
plt.figure(figsize=(7,7))
plt.title("Student Reported Comfort with Term Test #1")
name = list(dict(combined_df.term_test_check.value_counts()).keys())
values = list(dict(combined_df.term_test_check.value_counts()).values())
colors = ['#c4bcc0','#ff9999','#d7edb9','#98AFC7']
explode = (0.1, 0, 0)
plt.pie(values, labels=name, colors=colors, startangle=5,autopct='%1.1f%%',explode=explode, shadow=True)
plt.show()
combined_df['tutorial_attended_bool'] = np.where(combined_df['Tutorial']!="Did Not/Will Not Attend","True","False")
combined_df.head(2)
| section | section_id | section_sis_id | submitted | lecture_sections | lecture_count | tutorial_sections | practice_problems | midterm_review | practical_check | ... | LEC0101_attended | LEC0102_attended | LEC0103_attended | lecture_assigned | cleaned_answer | polarity | discipline_num | multiple_sections | lecture_name_assigned | tutorial_attended_bool | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | APS106H1-S-LEC0102-20231, APS106H1-S-PRA0106-2... | 272282, 272296, 272315 | APS106H1-S-LEC0102-20231, APS106H1-S-PRA0106-2... | 2023-02-06 15:33:42 UTC | LEC02 - Beck | 3.0 | TUT0107 (Tuesday, 11:00 - 12:00) | No | Very Helpful | Yes | ... | False | LEC0102 | False | True | nervous for the midterm | -1.1 | 7.0 | False | Beck | True |
| 1 | APS106H1-S-TUT0106-20231, APS106H1-S-PRA0106-2... | 272313, 272296, 272282 | APS106H1-S-TUT0106-20231, APS106H1-S-PRA0106-2... | 2023-02-06 15:32:37 UTC | LEC02 - Beck | 2.0 | TUT0106 (Monday, 13:00 - 14:00) | Yes | Didn't attend the lecture or watch the recording | Yes | ... | False | LEC0102 | False | True | good | 1.9 | 3.0 | False | Beck | True |
2 rows × 32 columns
# analyze students identifying as not prepared at all
df_not_prepared = combined_df[combined_df['term_test_check']=='Not prepared at all']
columns = [
'progress_check',
'lecture_count',
'tutorial_attended_bool',
'practice_problems',
'practical_check',
'midterm_review'
]
titles = [
"APS106 Progress Check",
"Number Lectures Attended in Week 4",
"Tutorial Attended",
"Practice Problems Completed",
"Lab Attendance",
"Midterm Review Attended"
]
rows = 2
cols = 3
fig, axs = plt.subplots(rows, cols, figsize=(14, 7))
fig.suptitle("Analysis of Students Identifying as Not Prepared for Term Test",fontsize = 17)
for index, column in enumerate(columns):
row = index % 3
col = index // 3
name = list(dict(df_not_prepared[column].value_counts()).keys())
values = list(dict(df_not_prepared[column].value_counts()).values())
colors = ['#ADD8E6','#ffcc99','#98AFC7','#ff9999','#d7edb9','#c4bcc0','#67805c','#ab446c','#41734f','#eb4034']
# explode = (0.1, 0)
axs[col, row].pie(values, labels = name, colors = colors, autopct='%1.1f%%', startangle=90)
axs[col, row].title.set_text("{}".format(titles[index]))
plt.show()
# how are students doing in week 2?
plt.figure(figsize=(7,7))
plt.title("Student Reported Comfort with APS106 Week {}".format(week))
name = list(dict(combined_df.progress_check.value_counts()).keys())
values = list(dict(combined_df.progress_check.value_counts()).values())
colors = ['#ADD8E6','#ffcc99','#98AFC7','#ff9999','#d7edb9','#c4bcc0','#67805c','#ab446c']
explode = (0.1, 0, 0)
plt.pie(values, labels=name, colors=colors, startangle=-45,autopct='%1.1f%%',explode=explode, shadow=True)
plt.show()
#define the total counts per discipline
total_students = dict(combined_df['discipline'].value_counts())
#define separate dataframe based on student discipline + attendance counts
agg_students = combined_df.groupby(['discipline','progress_check']).count()
agg_students = agg_students.reset_index()
agg_students = agg_students[["discipline","progress_check","section"]].copy()
#agg_students = agg_students[agg_students['lecture_count']!=0]
agg_students['total_count'] = agg_students['discipline'].map(total_students)
agg_students['percent'] = agg_students['section'] / agg_students['total_count'] * 100
progress_numeric = {
'I have fallen behind': 0,
'I am starting to fall behind': 1,
'I am up-to-date': 2
}
agg_students['progress_numeric'] = agg_students['progress_check'].map(progress_numeric)
agg_students = agg_students.sort_values(by=['progress_numeric','discipline'])
plt.figure(figsize=(15,10))
plt.title('APS106 Progress Check vs. Student Disciplines Normalized', fontsize = 20)
ax = sns.barplot(x="progress_check", y="percent", hue="discipline", data=agg_students)
# add labels
for i, p in enumerate(ax.patches):
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 0.1,"{}%".format(round(height,0)),ha="center")
ax.set_ylabel('% Student Count', fontsize=14)
ax.set_xlabel('Student Progress', fontsize=14)
plt.legend(loc='upper left')
plt.show()
posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values posx and posy should be finite values
# how many students are not attending their assigned lecture section?
plt.figure(figsize=(12,8))
progress_numeric = {
'I have fallen behind': 0,
'I am starting to fall behind': 1,
'I am up-to-date': 2
}
combined_df['progress_numeric'] = combined_df['progress_check'].map(progress_numeric)
combined_df = combined_df.sort_values(by=['progress_numeric'])
plt.title('Self-Reported Number of Students Course Progress vs. Practice Problems', fontsize = 18)
ax = sns.countplot(x="practice_problems", hue="progress_check", data=combined_df)
for i, p in enumerate(ax.patches):
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 0.1, height ,ha="center")
ax.set_xlabel('Practice Problem Completion', fontsize=14)
ax.set_ylabel('# Students', fontsize=14)
plt.show()
# how many students are not attending their assigned lecture section?
plt.figure(figsize=(12,8))
progress_numeric = {
'I have fallen behind': 0,
'I am starting to fall behind': 1,
'I am up-to-date': 2
}
combined_df['progress_numeric'] = combined_df['progress_check'].map(progress_numeric)
combined_df = combined_df.sort_values(by=['progress_numeric','discipline'])
plt.title('Self-Reported Number of Students Course Progress vs. Lecture Section', fontsize = 18)
ax = sns.countplot(x="lecture_name_assigned", hue="progress_check", data=combined_df)
for i, p in enumerate(ax.patches):
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 0.1, height ,ha="center")
ax.set_xlabel('Total Students Present per Lecture Section', fontsize=14)
ax.set_ylabel('# Students', fontsize=14)
plt.show()
# check general % of students comfortable with the new topics
columns = ['function_return_check', 'while_loop_check']
titles = ['Functions and Returns','While Loops']
fig, ax = plt.subplots(1,len(columns), figsize=(20, 30))
for index, value in enumerate(columns):
column = dict(combined_df[value].value_counts())
data = list(column.values())
labels = ['Yes', 'Partially', 'No']
explode = (0.1, 0, 0)
total = sum(data)
data_per = data/total*100
colors = ['#99ff99','#98AFC7','#ff9999','#ffcc99']
#ax[index].pie(data_per, labels = [round(i,2) for i in (list(data_per))])
ax[index].pie(data_per, labels = labels, colors = colors, autopct='%1.1f%%', shadow=True, startangle=90,explode=explode)
ax[index].title.set_text("Students' Understanding of {}".format(str(titles[index])))
plt.show()
# comfort level correlated with attendance level
fig, axs = plt.subplots(1, 2, figsize=(14, 7))
fig.suptitle("Week {}: Student Attendance vs Comfort with Programming Concepts".format(week), fontsize = 17)
columns = ['function_return_check', 'while_loop_check']
titles = ['Functions and Returns','While Loops']
numeric = {
'I do not understand': 0,
'I somewhat understand': 1,
'I completely understand': 2
}
new_column = [ x + '_numeric' for x in columns]
for index, value in enumerate(columns):
combined_df[new_column[index]] = combined_df[value].map(numeric)
sorted_df = combined_df.sort_values(by=[new_column[index]])
sns.countplot(x="lecture_count", hue=value, data=sorted_df, ax=axs[index])
axs[index].set_title("{}".format(titles[index]))
for i, p in enumerate(axs[index].patches):
height = p.get_height()
axs[index].text(p.get_x()+p.get_width()/2., height + 0.1, height ,ha="center")
if index == 0:
axs[index].set_ylabel("# Students", fontsize = 14)
else:
axs[index].set_ylabel(" ", fontsize = 6)
axs[index].legend(loc='upper left')
fig.tight_layout()
plt.show()
# all the answers from the students in one-word "how are you feeling", which are unqiue
combined_df['comment_one_word'].unique()[20:24]
array(['Not bad but not good', 'loop\xa0', 'Good', 'Hopeful'],
dtype=object)
from wordcloud import WordCloud
from wordcloud import ImageColorGenerator
from wordcloud import STOPWORDS
#a word cloud is generated to show the most frequently appearing words
word_text = str((combined_df['comment_one_word'].values))
word_text = word_text.lower()
#remove extra characters
extra_chars = ["xa0","\n","'"]
for char in extra_chars:
word_text = word_text.replace(char,"")
# define word cloud
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white", collocations=False, width=1000, height=500).generate(word_text)
plt.figure(figsize=(12,8))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
# describe the observed polarity
combined_df[['polarity']].describe()
| polarity | |
|---|---|
| count | 404.000000 |
| mean | 0.606188 |
| std | 1.296430 |
| min | -3.000000 |
| 25% | 0.000000 |
| 50% | 0.000000 |
| 75% | 1.900000 |
| max | 5.700000 |
In general, the polarity is quite neutral, slightly on positive side (mean is 0.01). The maximum polarity is also 2x greater than the most negative polarity.
print("Most positive feedback, Week 1:")
for text in combined_df.sort_values('polarity').tail(7)['comment_one_word']:
print('\n ', text)
Most positive feedback, Week 1: Great! Great Great Not great Love Beck I feel as though the online practical session has been helpful. I also think the TA on the Tuesday 11am-12pm tutorial is helpful. I feel like I have a good understanding of everything conceptually, I just need to practice a bit more to be very comfortable with it.
print("Most negative feedback, Week 1:")
for text in combined_df.sort_values('polarity').head(6)['comment_one_word']:
print('\n ', text)
Most negative feedback, Week 1: Terrified Not bad Dangerous frustrating tired Scared for Midterm
#a word cloud is generated to show the most frequently appearing words
word_text = str((combined_df['comment_one_word'].values))
word_text = word_text.lower()
#remove extra characters
extra_chars = ["xa0","\n","'","nan"]
for char in extra_chars:
word_text = word_text.replace(char,"")
# define word cloud
stopwords = set(STOPWORDS)
long_answer_words = WordCloud(stopwords=stopwords, background_color="white", collocations=False, width=1000, height=500).generate(word_text)
plt.figure(figsize=(12,8))
plt.imshow(long_answer_words)
plt.axis("off")
plt.show()
# how many students attended more than one instructor session last week (i.e. both Kinsella and Rosu lectures)
plt.figure(figsize=(7,7))
plt.title("Top 10 Most Commonly Appearing Words in Reflections")
name = list(dict(long_answer_words.words_).keys())[0:2] + list(dict(long_answer_words.words_).keys())[4:12]
values = list(dict(long_answer_words.words_).values())[0:2] + list(dict(long_answer_words.words_).values())[4:12]
plt.barh(name, values)
plt.show()
# most commonly appearing words in different polarity reflections
filt = [combined_df[combined_df['polarity']<0], combined_df[combined_df['polarity']>0]]
title = ["Words in Negative Reflections", "Words in Positve Reflections"]
fig, ax = plt.subplots(1,2, figsize=(20, 20))
for index, df in enumerate(filt):
word_text = str((df['comment_one_word'].values))
word_text = word_text.lower()
#remove extra characters
extra_chars = ["xa0","\n","'","nan"]
for char in extra_chars:
word_text = word_text.replace(char,"")
# define word cloud
stopwords = set(STOPWORDS)
words = WordCloud(stopwords=stopwords, background_color="white", collocations=False, width=1000, height=500).generate(word_text)
ax[index].imshow(words)
ax[index].set_title("{}".format(title[index]))
ax[index].axis("off")
plt.show()
#a word cloud is generated to show the most frequently appearing words
word_text = str((combined_df['confused_topics'].values))
word_text = word_text.lower()
#remove extra characters
extra_chars = ["xa0","\n","'","nan",]
for char in extra_chars:
word_text = word_text.replace(char,"")
# define word cloud
stopwords = set(STOPWORDS)
confused_topics_words = WordCloud(stopwords=stopwords, background_color="white", collocations=False, width=1000, height=500).generate(word_text)
plt.figure(figsize=(12,8))
plt.imshow(confused_topics_words)
plt.axis("off")
plt.show()
# how many students attended more than one instructor session last week (i.e. both Kinsella and Rosu lectures)
plt.figure(figsize=(7,7))
plt.title("Top 10 Most Confusing Topics Week {} (density)".format(week))
name = list(dict(confused_topics_words.words_).keys())[0:2] + list(dict(confused_topics_words.words_).keys())[3:11]
values = list(dict(confused_topics_words.words_).values())[0:2] + list(dict(confused_topics_words.words_).values())[3:11]
plt.barh(name, values)
plt.show()